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METHOD AND APPARATUS FOR VIDEO FRAME SEQUENCE- 
BASED OBJECT TRACKING 

BACKGROUND OF THE INVENTION 
5 . RELATED APPLICATIONS 

The present invention relates and claims priority from US 
provisional patent application serial number 60/354,209 titled ALARM 
SYSTEM BASED ON VIDEO ANALYSIS, filed 6 February 2002. The 
present invention also claims priority firom and is related to POT application 
10 serial number PCT/IL02/01042 titled SYSTEM AND METHOD FOR VIDEO 
CONTENT-ANALYSIS-BASED DETECTION, SURVELLANCE, AND 
ALARM MANAGEMENT, filed 24 December 2002. 

FIELD OF THE INVENTION 
The present invention relates to video surveillance systems in 
15 general, and more particularly to video firame sequence-based objects tracking 
in video siurveillance environments. 

DISCUSSION OF THE RELATED ART 
Existing video surveillance systems are based on diverse automatic 
object tracking methods. Object tracking methods are designed to process a 
20 captured sequence of temporally consecutive images in order to detect and 
track objects that do not belong to the "natural" scene being monitored. Current 
object tracking methods are typically performed by the separation of the objects 
from the background, (by delineating or segmenting the objects), and via the 
determination of the motion vectors of the objects across the sequence of 
25 frames in accordance with the spatial transformations of the tracked objects. 
The drawbacks of the current methods concern the inability to track static 
objects for a lengthy period of time. Thus, following a short interval, during 
which a previously dynanaic object ceased moving,- the tracking of the same 
object is effectively rendered. An additional drawback of the current methods 
30 concerns the inability of the methods to handle "occlusion" situations, such as 
where the tracked objects are occluded (partially or entirely) by other objects 
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. temporarily passing through or permanently located between the image 
acquiring devices and the tracked object 

There is a need for an advanced and enhanced surveillance, object 
tracking and identification system. Such a system would preferably automate 
S the procedure, concerning the identification of an unattended object Such a 
system would further utilize an advanced object tracking method that would 
provide the option of tracking a non-moving object for an operationally 
effective period and would continue tracking objects in an efficient manner 
even where the tracked object is occluded. 

10 SUMMARY OF THE PRESEOT 

One aspect of the present invention regards an apparatus, for the analysis 
of a sequence of captured images covering a scene for detecting and tracking of 
moving and static objects and for matching the patterns of object behavior in 
the captured images to object behavior in predetermined scenarios. The 

15 apparatus comprises at least one image sequence source for transmitting a 
sequence of images to an object tracking program, and an object tracking 
program. The object tracking program comprises a. pre-processing application 
layer for constructing a difference image between a currently captured video 
frame and a previously constructed reference image, an objects clustering 

20 implication layer for generating at least one new or updated object from the 
difference image and an at least one existing object, and a background updating 
application layer for updating at least one reference image prior to processing 
of a new frame. 

A second aspect of the present invention regards a method for the analysis 
25 of a sequence of captured images showing a scene for detecting and tracking of 
at least one moving or static object and for matching the patterns of the at least 
' one object behavior in the captured images to object behavior in predetermined 
scenarios. The method comprises capturing at least one image of the scene, pre- 
processing the captured at least one image and generating a short term 
30 difference image and a long term difference image, clustering the at least one 
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moving or static object in tbte short term difference and long term difference 
images, and generating at least one new object and at least one existing object 
BRIEF DESCRIPTION OF THE DRAWINGS 
The present invention will be understood and appreciated more fully 
5 Gcom the following detailed description taken in conjunction with the drawings 
in which: 

Fig. 1 is a schematic block diagram of the system architecture, in 
accordance with a preferred embodiment of the present invention; 

Fig. 2 is a hi^-level block diagram showing the application layers 
10 of the object tracking apparatus, in accordance with the preferred embodiment 
of the present invention; 

Fig* 3 is a block diagram illustrating the components of the 
configuration layer, in accordance with the preferred embodiment of the 
present invention; 

15 Fig. 4A is a block diagram illustrating the components of the pre- 

processing layer, in accordance with the preferred embodiment of the present 
invention; 

Fig. 4B is a block diagram illustrating the components of the 
clustering layer, in accordance with the preferred embodiment of the present 
20 invention; 

Fig. 5 A is a block diagram illustrating the components of the scene 
characterization layer, in accordance with the preferred embodiment of the 
present invention; 

Fig. 5B is a block diagram illustrating the components of the 
25 background update layer, in accordance with the preferred embodiment of the 
present invention; 

Fig. 6 is a block diagram showing the data structures associated with 
the object tracking ^paratus, in accordance with a preferred embodiment of 
the present invention; 
30 Fig. 7 illustrates the operation of the object tracking method, in 

accordance with the preferred embodiment of the present invention; 
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Fig. 8 describers the operation of the reference image leamihg 
routine, in accordance with a preferred embodiment of the present invention; 

Fig. 9 shows the input and output data structures associated with the 
pre-processing layer, in accordance with a preferred embodiment of the present 
5 invention; 

Figs. lOA, lOB and IOC describe the operational steps associated 
■ with the clustering layer, in accordance with the preferred embodiment of the 
present invention; ' 

Fig. 11 illustrates the scene characterization^ in accordance with the 
10 prefen:edCTibodiment of the present invention; 

Fig. 12 illustrates the background updating, in accordance with the 
preferred embodiment of the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

15 An object tracking apparatus and method for the detection and 

tracking of dynamic and static objects is disclosed. The apparatus and method 
may be utilized in a monitoring and surveillance system. The surveillance 
system is operative in the detection of potential alarm situation vid a recorded 
surveillance content analysis and in the management of the detected unattended 

20 object situation via an alarm distribution mechanism. The object tracking 
apparatus supports the object tracking method that incorporates a unique 
method for detecting, tracking and counting objects across a sequence of 
captured surveillance content images. Through the operation of the object 
tracking method the captured content is analyzed and the results of the analysis 

25 provide the option of activating in real time a set of alarm messages to a set of 
diverse devices via a triggering mechanism. In order to provide the context in 
which the object tracking apparatus method is useful, several exemplary 
associated applications will be briefly described. The method of the present 
invention may be implemented in various contexts such as the detection of 

30 unattended objects (luggage, vehicles or persons), identification of vehicles 
parking or driving in restricted zones, access control of persons into restricted 
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zoaes, prevention of loss of objects (luggage or persons) and counting of 
persons^ as well as in police and fire alarm situations. In likewise manner the 
object tracking apparatus and poiethod described here in . may be useful in 
myriad of other situations and as a video objects analysis tooL 
S In the preferred embodiments of the ' present invention, fhe 

monitored content is a stream of video images recorded by video cameras, 
captured, and sampled by a video capture device and transferred to a video 
processing unit Each part of this system may be located in a single device or in 
separate devices located in various locations and inter-connected by hardwire 

10 or via wireless connection over local or wide or other networks. The video 
processing unit performs a content analysis of the video frames where the 
content analysis is based on the object tracking method. The results of the 
analysis could indicate an alarm situation. In other preferred embodiments of 
the invention, diverse other content formats are also analyzed, such as thermal 

15 based sensor cameras, audio, wireless linked cameras, data produced from 
motion detectors, and the like. 

An exemplary application that could utilize the apparatus and 
method of the present invention concerns the detection of unattended objects, 
such as luggage in a dynamic object-rich environment, such as an airport or 

20 city center. Other exemplary applications concern the detection of a vehicle 
parked in a forbidden zone, or the extended-period presence of a non-moving 
vehicle in a restricted-period parking zone. Forbidden or restricted parking 
zones are typically associated with sensitive traffic-intensive locations, such as 
a city center. Still applications that could use the apparatus and method include 

25 the tracking of objects such as persons involved in various scenario models, 
such as a person leaving the vehicle away from the terminal, which may equal 
suspicious (ur^redicted) behavioral pattern. La other possible applications of 
the apparatus and method of the present invention can be implemented to assist 
in locating lost luggage and to restrict access of persons or vehicles to certain 

30 zones. Yet other applications could regard the detection of diverse other objects 
in diverse other environments. The following description is not meant to be 
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limiting and the scope of the invention is defined only by the attached claims. 
Several such applications are described in detail in related PCX patent 
application serial number PCT/IL02/01042 titled SYSTEM AND METHOD 
FOR VIDEO CONTENT-ANALYSIS-BASED DETECTION, 
5 SURVEILLANCE, AND ALARM MANAGEMENT, filed 24 December 
. 2002, the content of which is incorporated herein by reference. 

The method and apparatus of the present invention is operative in 
the analysis of a sequence of video images received firom a video camera 
coyering a predefined area, referred herein below to as the video scene. In one 

10 example it may be assumed that the object monitored is a combined object 
comprising an- individual and a suitcase where the individual carries the 
siiitcase. The combined object may be separated into a first separate object and 
a second separate object It is assumed that the individual (second object) 
leaves the suitcase (first object) on the floor, a bench, or the like. The first 

15 object remains in the video scene without movement for a pre-defined period of 
time. It is assumed that the suitcase (first object) was left unattended. The 
second object exits the video scene. It is assumed that the individual (second 
object) left the video scene without the suitcase (first object) and is now about 
leave the wider area around the video scene. Following the identification of the 
. 20 previous sub*-events, referred to collectively as the video scene characteristics, 
the event will be identified by the sj^tem as a situation in which an unattended 
suitcase was left in the security-sensitive area. Thus, the unattended suitcase 
will be considered as a suspicious object. Consequently, the systerh of the 
present iuvention generates, displays, and or distributes an alarm indication. 

25 Likewise, in an altemative embodiment a first object, such as a suitcase or 
person monitored is already present and monitored within the video scene. 
Such object can be lost luggage located within the airport Such object can be a 
person monitored The object may merge into a second object The second 
object can be a person picking up the luggage, another person to whom the first 

30 person joins or a vehicle to which the first person enters. The first object (now 
merged with the second object) may move firom its original position and exist 
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the scene or move in a prohibited direction so predetermined. The application 
will provide an indication to a hutnan operator. The indication may be oral, 
visual or written. The indication may be provided visually to a screen or 
delivered via communication networks to officers located at the scene or to o£f- 
5 premises or via dry contact to an external device such as a siren, a bell, a 
flashiiig or revolving light and the like. An additional exemplary {^plication 
that could utilize the apparatus and method of the present invention regards a 
detection of vehicles parked in restricted area or moving in restricted lanes. 
Airports, government buildings, hotels and other institutions typically forbid 

10 vehicles from parking in specific areas or driving in restricted lanes. In some 
areas parking is forbidden all the time while in other areas parking is allowed 
for a short period, such as several minutes. The second exemplary application' 
is designed to detect vehicles parking in restricted areas for more than a pre- 
defined number of time units and generates an alarm when identifying an 

15 illegal parking event of a specific vehicle. In another preferred embodiment the 
systein and method of the present invention can detect whether persons 
disembark or embark a vehicle in predefined restricted zones. Other exemplary 
applications can include the monitoring of persons and objects in city centers, 
warehouses, restricted areas, borders or checkpoints and the like. 

20 It would be easily perceived that for the successfiil operation of the 

above-described appUcadons an object tracking apparatus and an object 
tracking method are required. The object tracking method should be capable of 
detecting nioving objects, tracking moving objects and tracking static objects, 
such as objects that are identified as moving and subsequently identified as 

25 non-moving during a lengthy period of time. In order to match the patterns of 
object behavior in the captured image sequences to the patterns of object 
behavior in above-described scenarios, the object tracking method should 
recognize linked or physically connected objects, to be able to recognize the 
separation of the linked objects, to. track the separated objects while retaining 

30 the historical coimectivity states of the objects. The object tracking ^paratus 
and method should further be able to handle occlusions where the tracked 
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objects are occluded by one or more separate objects temporarilyi semi- 
. perzoanently or permanently. 

Referring to Fig. 1 the image sequence sources 12 are one or more 
video cameras operating in a security-wise sensitive environment and cover a 
5 specific pre-defined visual area that is required to .be monitored The area 
monitored can be any area preferably in a transportation area including an 
airport, a city center, a building, and restricted or non-restricted areas within 
buildings or outdoors. The image sequence sources 12 could include analog 
devices and/or digital devices. The images provided by the image sequence 

10 sources could include normal light, infrared, temperature, or any other form of 
radiation. The image sequence sources 12 continuously acquire and transniit 
sequences of video images and provide the images simultaneously to an image 
sequence display device 20 and to a computing and storage device 15. The 
display device 20 could be a video terminal, which is operated by a himian 

15 operator or any other display device including a display device located on a 
mobile or hand held device. Alarm triggers are generated by the object tracking 
program 14 installed in the computing and storage device 15 in order to 
indicate an alarm situation to the operator of the display device 20. The alarm 
may be generated in the form of an audio or any other indication. The image 

20 sequence sources 12 transmit sequences of video images to an object tracking 
program 14 via suitably wired connections. The images could be provided 
through an analog interface, a digital interface or through a Local Area 
Network (LAN) interface or Wide Area Network (WAN), IP, Wireless, 
Satellite coimectivity. The computing and storage device 15 could be an 

25 extemal computing platform, such as a personal computer (PC), a UNIX 
workstation or a mainfirame computer having appropriate processing and 
. storage units or a dedicated hardware such as a DSP based platform. It is 
contemplated that fixture hand held devices will be powerfiil enough to also 
implement device 15 there within. The device 15 could be also an array of 

30 integrated circuits with built-in digital signal processing (DSP) and storage 
capabilities coupled directly to the image sequence sources 12. The device 15 



8 



wo 03/067884 



PCT/IL03/00097 



includes a set of object tracking routines constituting the object tracking 
program 14 and a set of object tracking control data structures 16. The object 
tracking program 14 in association with tlie object tracking control data 
structures 16 receives the image sequence firom the image sequence sources 12, 
S and processes the ima^e sequence in order. to detect and to track objects 
therein. Consequent to the detection of pre-defined spatio-temporal patterns of 
behavior associated with the tracked objects across the image sequences 
appropriate alarm triggers are generated and transmitted to the display device 
20. ' 

10 Still referring to Fig. 1 the object tracking program 14 and the 

associated control data structures 16 could be installed in distinct platforms 
and/or devices distributed randomly across a Local Area Network (LAN) that 
could communicate over the LAN in&astructure or across Wide Area Networks 
(WAN). One example is a Radio Frequency Camera that transmits composite 

15 video remotely to a receiving station, the receiving station can be connected ta 
other components of the system via a network or directly. The program 14 and 
the associated control data structures 16 could be installed in distinct platforms 
and/or devices distributed randomly across very wide area networks such as the 
Intemet Various forms of communication between the constituent parts of the 

20 system can be used. Such can be a data communication network, which can be 
connected via landlines or wireless or like communication devices and that can 
be implemented via TCP/IP protocols and like protocols. Other protocols and 
methods of communications, such as cellular, satellite, low band, and high band 
communications networks and devices will readily be useful in the 

25 implementation of the present invention. The program 14 and the associated 
control data structures 16 could be further co-located on the same computing 
platform or distributed across several platforms for load balancing, redundancy 
considerations, back-up in the case of eqmpment failure, and the like. Although 
on the drawing imder discussion only a single image sequence source and a 

30 single computing and storage device is shown it will be readily perceived that 
in a realistic environment a plurality of image sequence somrces could be 
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connected to a plurality of computing and storage devices. Moreover, two 
image sequence sources each capturing a slightly difierent scene may provide a 
stereo image sequence source. Likewise, a multiplexed image sequence source 
&om a plurality of image capturing devices may be used. The object tracking 
5 apparatus comprises an object tracking program and associated object tracking 
control data structures. 

Referring now to Fig. 2 which is a high-level block diagram 
showing the application layers of the object tracking apparatus of the present 
invention. The object tracking program 14 includes several application layers. 

10 Each application layer is a group of logically and functionally linked computer 
program components responsible fqr diSerent aspects of fhe application within 
^ the apparatus of the present invention. The object tracking program 14 includes 
a configuration layer 38, a pre-processing layer 42, and an objects clustering 
layer 44, a scene characterization layer 46, and a background updating layer 48. 

15 Each layer is a computer program executing within the computerized 
environment shown in detail in association with the description of Fig. L The 
configuration layer 38 is a responsible for the initialization of the apparatus of 
the present invention in accordance with specific user-defined parameters. The 
pre-processing layer 42 is operative in constructing difference images between 

20 a currentiy captured video fi:ame and previously constmcted reference images. 
The objective of the objects clustering layer 44 is to generate new and or 
updated objects firom the difference images and the existing objects. The scene 
characterization layer 46 uses tiie objects generated by the objects clustering 
layer 44 to describe the monitored scene. The layer 46 also includes* a 

25 triggering mechanism that compares the behavior pattem and other 
characteristics of the objects to pre-defined behavior patterns and 
characteristics in order to create alarm triggers. The background updating layer 
48 updates the reference images for the processing of the next firame. A more 
detailed description of the structure and fimctionality of the application layers 

30 will be provided herein under in association with the following drawings. 
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Referring to Fig. 3 shows a block diagram illustrating the 
components of the configuration layer. The configuration layer 38 coioiprises a 
reference image constructor component 50, a timing parameters definer 
component 52, and a visual parameters definer component 54. The reference 
S image constructor component 50 is responsible for the acquisition of the 
background model. The reference image is generated in accordance with a pre- 
defined option. The component 50 includes a current frame capture module 56, 
a reference image loading module 60, and a reference image learning module 
62. In accordance with the pre-selected option the reference image may be 

10 created alternatively from; a) a currently captured fi:ame, b) an existing 
reference image, c) a reference image learning module. The curr^t firame 
capture module 56 provides a currently captured firame to be used as the 
reference image. The currently captured frame can be a fiiame firom any camera 
covering the scene. The reference image loading module 60 provides the option 

15 for loading an existing reference image located on file locally or remotely. The 
user may select the appropriate image from the file and designate it as the 
reference image. The reference image learmng module 62 provides the option 
that the reference image is generated adaptively learned fi^om a consecutive 
sequence of captured images. The timiag parameters definer component 52 

20 provides time settings information, such as the number of time units to be 
elapsed before the genemtion of a trigger on a static object, and the like. The 
visual parameters definer component 54 provides the option to the user to 
define the geometry of the monitored scene. The component 54 includes, a 
camera tilt setting module 64, a camera zoom setting module 65, a region 

25 location definition module 66, a region type definition module 67, and an alarm 
type definition module 68. The module 64 derives the camera tilt in 
accordance with the measurements taken by a user of an arbitrary object 
located at different location ia the monitored scene. The module 65 defines the 
maximum, the minimum and the typical, size of the objects to be tracked. The 

30 region location definition module 66 provides the definition of the location of 
one or more regions-of-interest in the scene. The region type definition module 
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67 enables the user to define a region of interest as "objects track region" or "no 
objects track region". The alarm type definition module 68 defines a region of 
. interest as "trigger alarm in region" or "no alarm .trigger in region", in 
accordance 'with the definitions of the user. 
5 RefCTting now to Fig. 4A showing a block diagram illustrating the 

components of the pre-processing layer, in accordance with the preferred 
embodiment of the present invention. The pre-processing layer 42 comprises a 
current frame handler 212, a short-term reference image handler 214, a long- 
term reference image handler 216, a pre-processor module, a short-term 

10 difference image updater 220, and a long-term difference image updater 222. 
Each module is* a computer program operative to perform one or more tasks in 
association with the computerized system of Fig. 1. The current firame handler 
212 obtains a cuirentiy captured firame and passes the frame to the pre- 
processor module 218. The short-term reference handler 214 loads an existing 

15 short-term reference image and passes the firame to the pre-processor module 
218. The handler 214 could fiirther provide calculations concerning the 
moments of the short term reference image. The long-term reference handler 
216 loads an existing long-term reference image and passes the fi-ame to the 
pre-processor module 218. The handler 216 could further provide calculations 

20 concerning the moments of the long term reference image. 

The pre-processor module 218 uses the current frame and the 
obtained reference images as input for processing. The process generates a new 
short-term difference image and a new long-term difference image and 
subsequently passes the new difference images to the short-term reference 

25 image updater (handler) 220 and the long-term difference image updater 
(handler) 222 respectively. Using the new difference images the updater 220 
and the updater 222 update the existing short-term reference image and the 
existing long-temx reference image respectively. 

Referring no w to Fig. 4B showing a block diagram illustrating the 

30 components of the clustering layer, in accordance with the preferred 
embodiment of the present invention. The clustering layer 44 comprises an 
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object merger module 231, an objects group builder module 232, an objects 
group adjuster module 234, a new objects creator module 236, an object 
searcher module 240, a Kalman filter module 242, and an object status updater 
254, Each module is a computer program operative to perform one or more 
5 tasks in association with the comput^ized system of Fig. 1. The object merger 
module 231 corrects clustering errors by the successive merging of partially 
overlapping objects having the same motion vector for a pre-defined period. 
The objects group builder 232 is responsible for creating groups of close 
. objects by using neighborhood relations among the objects. The object group 

10 adjuster 234 initiates a group adjustment processes in order to find the optimal 
spatial parameters of each object in a group. The new objects constructor 
module 236 constructs new objects from the difference images, controls the 
operation of a specific object location and size finder function and adjusts new 
objects. The new objects may be construed firom the difference images whether 

15 existing objects are compared with or where there are no existing objects. For 
example, when the system begins operation a new object may be identified 
even if there are no previously acquired and existing objects. The object 
search^ 240 scans a discarded objects archive in order to attempt to locate 
recently discarded objects v^fh parameters (such as spatial parameters) similar 

20 _ to a newly created object 

• In order to improve accuracy of the tracking and in order to reduce 
the computing load a Kalman filter module 242 is utilized to track the motion 
of the objects. The object status updater 254 is responsible for modifying the 
status of the object from "static" to "dynamic" or from "dynamic" to "static". A 

25 detailed description of the clustering layer 44 will be set forth herein under in 
association with the following drawings. 

Referring now to Fig. 5A showing a block diagram illustrating the 
components of the scene characterization layer, in accordance with tiie 
preferred embodiment of the present invention. The sc^e characterization 

30 layer 46 comprises an object movement measurement module 242, an object 
merger module 244, and a triggering mechanism 246. The object movenlent 

13 



wo 03/067884 PCT/IL03/00097 

measurement module 242 analyzes the changes in the spatial parameters of an 
object and determines whether the object is moving or stationary. The object 
merger module 244 is responsible for correcting errors to objects as a result of 
the clustering stage. The functionality of the triggering mechanism 246 is to 
5 check each object against tiie spatio-temporal behavior patterns and properties 
defined as "suspicious" or as alarm triggering. When a suitable match is found 
the mechanism 246 generates an alarm trigger. The operation of the scene 
characterization layer 46 will be described herein under in association with the 
following drawings. 

10 Referring now to Fig. 5B showing a block diagram illustrating the 

■ 

components of the background update layer, in accordance with the preferred 
embodiment of the present invention. The background updating layer 48 
comprises a background draft updater 248, a short-term reference image 
iq)dater 250, and ^ long-term reference image updater 252. The functionality of 

15 the updater 248 is to update continuoxisly the backgroxmd or reference "draff* 
fi*ame from the current frame. The short-term reference image updater 250 and 
the long-term reference irdage updater 252 maintain the short-term reference 
image and the long-term reference image, respectively. A detailed description 
of the operation of the background-updating layer 48 will be provided herein 

20 under in association with following drawings. 

Referring now to Fig. 6 showing a block diagram of the . data 
structures associated with the object tracking apparatus, in accordance with a 
preferred embodiment of the present taventioa The object tracking control 
structures 16 of Fig. 1 comprise a long-term reference image 70, a short-term 

25 reference image 72, an objects table 74, a sophisticated absolute distance 
(SAD) short-term map 76, a sophisticated absolute distance (SAD) long-term 
map 78, a discarded objects archive 82, and a background draft 84. The long- 
term reference image 70 includes the background image of the monitored scene 
without the dynamic and without the static objects tracked by the apparatus and 

30 method of the present inventiocu The short-term reference image 72 includes 
the scene background image and the static objects tracked by the object 
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tracking method. The objects table includes a list of dynamic and static objects 
with associated object data and object ineta data. The object data includes 
object identification, objects status, and various control fields, such as a non- 
moving counter, non-moving-time counter, and the like. The meta data 
5 comprises information concerning the current spatial parameters, tiie properties 
and the motion vector data of the objects acquired from the previously 
performed processing on a succession of previous frames. The short-term and 
long-term sophisticated difference maps (SADs) 76, 78 represent the difference 
between a currently captured frame and the short-term and long-term reference 

10 images 78, 80. The discarded object archive 82 stores discarded objects for 
object history. The backgroimd draft 84 (also referred to as the reference 
image, but not the short-term or long-term reference images) is a constantly 
changing image of the monitored scene. Each pixel within each current frame is 
taken into consideration when calculating the background draft 84. The draft 84 

15 is used for inserting "static" objects to the short-term reference image 72. The 
background draft 84 constantly reviews the scene backgroimd. If an object 
enters the monitored scene, such object is inserted into the background draft 84. 
When the method determines that the object is a "static" object (after the object 
was perceived as stationary across a pre-defined numb^ of captured frames). 

20 the pixels of the object are copied from the background draft 84 to the short- 
term reference image 72. 

Referring now to Fig. 7, the object tracking module operates by 
detecting objects across a temporally ordered sequence of consecutively 
captured images where the objects do not belong to the "natural" or "static" 

25 monitored scene. The object tracking module operates through the use of a 
central processing unit (not shown) utilizing data structures (not shown). The 
data stractures are maintaiued on one or more memory or storage devices 
installed across a hardware environment supporting the appUcatioiL Fig. 7 
illustrates the various steps in the operation of the object tracking method. The 

30* configuration step (not shown) is performed prior to the beguoning of the 
tracking (steps 88 through 94). In the configuration step the object-tracking 
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module is provided with reference images, with timing parameters and with 
visxial parameters, such as regions-of-interest definitions. The provided 
information enables the method to decide which regions of the firame to work 
on and in which regions should an alert situation be produced. The 
S configuration step optionally includes a reference iinage learning step (not 
shown) in which the backgroimd image is ad^tively learned in order to 
construct a long-term and a short-tj^m reference image from a temporally 
consecutive sequence of captured images. When no stationary objects were 
detected in the last firames the long-term reference image is copied and 

10 maintained as. a short-term reference picture. The Ipng-term reference image 
contains no objects while the short-term reference image includes static objects, 
such as objects that have been static for a pre-defined, period. In the preferred 
embodiment of the invention, flie lengfli of the pre-defined period is one minute 
while in other preferred embodiments otiier time values coidd be used. The 

IS . long-term reference image and the short-term reference image are updated for 
background changes, such as changes in the iUiimination artifacts associated 
with the image (Ughts or shadows or constantly moving objects (such as trees) 
and the like). The video firame pre-processing phase 88 uses a cvirrently 
captured firame and the short-term and long-term reference images for 

20 generating new short-term and long-term difference images. The difference 
images represent the difference between the currently captured firame and the 
reference images. The reference images can be obtained firom one of the image 
sequence sources described in association with Fig. 1 or could be provided 
directly by a user or by another system associated with the system of the 

25 present invention. Ihe difference images are suitably filtered or smoothened. 
The clustering phase 90 generates new or updated objects from the difference 
iniages and firom the previously generated or updated objects. The scene 
characterization phase 92 uses the objects received from the clustering phase 90 
in order to describe the scene. The background updating step 94 updates the 

30 short-term and long-term reference images for the next firame calculation. Note 
should be' taken that in other preferred embodiments of the invention other 
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similar or different processes could be used to accompiish the underlying 
objectives of the method of the present invention. 

Note should be taken that proposed apparatus and method is 
provided the capability of functioning in specific situations where an image 
5 acquiring device, such as a video camera, is not static. Examples for such 
situations include a pole-mounted outdoor camera operating in windy 
conditions or mobile a camera physically tacking a moving object. For such 
situations the object tracking method requires a pre-pre-processing phase 
configured such as to compensate for the potential camera movements between 

10 the capture of the reference images and the capture of each current frame. The 
pre-pre-processing phase involves an estimation of the relative overall frame 
movement (registration) between the current frame and the reference images. 
Consequent to the estimation of the registration (in terms of pixel of&et) the 
offset is applied to the refer^ce images in order to extract 'in-place** reference 

15 images for the object tracking to proceed in a usual manner. As- a result, 
extended reference images have to be used, allowing for margins (the content 
of which may be constantly updated) up to the maximal expected registration. 

The estimation of the registration (offset) between the current frame 
and the reference images involves a separate estimation of the x and y offset 

20 components, and a joint estimation of the x and y offset components. For the 
separate estimation, selected horizontal and vertical stripes of the current frame 
and the refer^ce images are averaged with appropriate weighting, and cross- . 
correlated in search of a maximum match in the x and y offsets, respectively. 
For the joint estimation, diagonal stripes are used (in both diagonal directions), 

25 froni which the x and y offsets are joiatly estimated. The resulting estimates are 
then averaged to produce the final estimate. 

Referring now to Fig. 8 which describers the operation of the 
reference image learning routine, in accordance with a preferred embodiment 
of the present invention. The construction of tiie long-term and short term 

30 reference images could be carried out in several alternative ways. A currentiy 
captured frame could be siored on a memory device as the long-term reference 



17 



wo 03/067884 PCT/IL03/00097 

image. Alternatively, a previously stored long-term reference, image could be 
loaded from the memory device in order to be tised as the current long-term 
reference image respectively. Alternatively, a specific reference image learning 
process could be activated (across steps 100 through 114). In step 100 the 

5 reference image learning process is performed across a temporally consecutive 
sequence of captured images where each of the frames is divided into macro 
blocks (MB) having a prcrdefined size, such as 16X16 pixels or 32X32 pixels 
or any like other division into macro blocks. Next at step 102 each MB is 
examined for motion vectors. The motion is detected by comparing the MB in a 

10 specific position in currently captured frame to the MB in the same position in 
the previously captured fi:ame. The comparison is performed during the. 
encoding step by using similar information generated therein for video data 
compression purposes.- According to the result of the examination each MB is 
marked as being in one of the following three states; a) Motion MB 108 where 

15 a motion vector is detected in the current MB relative to the parallel MB in the 
previously captured fi^ame, b) Undefined MB 104 where no motion vector is 
detected in the MB relative to tiie parallel MB in the previously captured frame 
but motion vector was detected across a previously captured set of temporally 
consecutive frames where the sequence is defined as having a pre-defined 

20 nimiber of firames. In the preferred embodiment of the invention the number of 
frames in the sequence is about 150 frames while in other preferred 
embodiments of the invention different values, could be used, c) Background 
MB 106 where no motion vector was detected across the previously captured 
sequence of temporally consecutive frames. In step 110 the values of each of 

25 the pixels in an MB that were identified as a Background MB are obtained and 
instep 112the values are averaged in time 112. Instep 1 14 an initial short term 
and long term reference image is generated from the values average in time. In 
order to avoid undetermined values for pixels in the MBs that were always in 
motion, such as an MB . wherein there was a constant motion (trees moving in 

30 wind), in step 114 the short-term reference image is created such that it 
contaios flie averages of the values of pixels in time. Subsequently, the pixels 
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are examined in order to find which pixels had insufficient background time 
(MBs that were always in motion). Pixels without sufficient backgroimd time 
are given the value firom the $hort-tenn reference image. 

Referring now to Fig. 9 showing the input and oulput data structmres 
S associated with the pre-processing layer, in accordance with a preferred 
embodiment of the present invention. The pre-processing step 88 of Fig. 6 
employs tiie current firame 264 and the short-term reference image 262 to 
generate a short-term difference image 270. The step 88 further uses the current 
frame 264 and the long-term reference image 266 to generate a long-term 

10 difference image 272. The long-term 272 and short-term 270 difference images 
represent respectively the sophisticated absolute difference (SAD) between the 
current frame 264 and the long-term 266 and the short-term 262 reference 
images. The size of the difference images (referred to herein after as SAD 
maps) 270, 272 is equal to the size of the current firame 264. Bach pixel in the 

15 SAD maps 270, 272 are provided witii an arbitrary value in the range of 1 
through 101. Other values may be used instead. High values indicate a 
substantial difference between the value of the pixel in the reference images 
262, 266 and the value of the pixel in the currentiy captured frame 264. Thus, 
the score indicates tiie probability for the pixel belonging either to the scene 

20 badcground or to an object The generation of the SAD maps 270, 272 is 
achieved by performing one of two alternative methods. 

Still referring to Fig. 9, in the first pre-processing method for each 
specific pixel in the currentiy captured frame 264 the absolute difference 
between the specific pixel and the inatching pixel in the reference images 262, 

25 266 is calculated where the calculation takes into accoimt the average pixel 
value: 

(1): D(x, y) = aO X Ymin(x, y) H-al X Yniax(x, y) + a3 

In the above equation the values of x, y concem the pixel 
30 coordinates. The values of Ymin and of Ymax represent the Iowct and the 
higher luminance levels at (x, y) between flie current frame 264 and the 
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reference images 262, 266. The values of aO, al, and a3 are thresholds, designed 
to minimize D(x, y) for similar pixels and maxinodzfe it for non-similar pixels. 
Consequent to the performance of the above equation for each of the pixels and 
to the generation of the SAD maps 270, 272 tihie SAD maps 270, 272 are 
S filtered for smoothing with two Gaussian filters one in the X coordinate and the 
second in the Y coordinate. 

In the second alternative pre-processing metiiod, around each pixel 
P(x, y) the following values are calculated where the calculation uses a 5X5 
pixels neighboring window for filtering. This step could be referred to as 
10 calculating the moments of each pixel. 
(2): 

x+2 y+2 



32* E 2L^x-i)P{,Uj) 



AflO(x,j;) = - 



MQO(x,y) 



jc+2 y+2 

32*2 ZO'-yW.y) 



15. MOl(x,y) = '•'•'-^'^ 



The results of fhe equations represent the following values: a) MOO 
is the sum of all the pixels around the given pixel, b) MIO is the sum of all the 
pixels around the given pixel each multiplied by a filter that detects horizontal 
edges, and c) MOl is the sum of all the pixels aroimd a given pixel multiplied 
20 by a filter that detects vertical edges. Next, the absolute difference between 
these three values in the current frame 264 and the reference images 270, 272 is 
performed. In addition the minitmim of MOOCurr and MOORef are calculated. 
(3): 

Dmix,y) = \MQQcurrix,y)-MQOr^{x,y)\ 
D\Qix,y) = ^Qcurripc,y)-M\Qr^{x,yi[ 
DOlix,y) ='\MQ\currix,y)-MO\rrf{,x,y)\ 

25 mnix,y) = xx^iMOOcurr{x,y),M00refix,y)) 
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Next the following equations are .used to construct the desired SAD 
mj^8 270,272: 
(4): 

iynpl(x,y) = AO*iD00(,x,y) + WO)-Mnix,y) 

Tmp7ix,y) = M* D\Oix,y) + W\ 

TmpZix, y) = Al* £>01(x, y) + W\ 

AO = 15, 

Al = 25 

WO = -40 

Wl = -44 

5 (5): 

- Tmpl(,x,y) = mia(32,Tmplix,y)) 
Tmplix, y) = niax(-32, 2>npl(jc, y)) 
Tmp2{x,y)='rmDCi2,Tmp2{x,y)) 
Tmp2{x, y) = max(-32, Tmp2{x, y)) 
fmp3(x,y) = Toaiiil2,Tmp3{x,yy) 
Tmp3(x,y) = max(-32,7>np3(jc,;;)) 

(6) 1 

l^pS^A^ix,y)^'*<^'"P'^^'y^^^^^^^^^^ 

64 

Through a convolution calculation the grade for each pixel is 
10 calculated while taking into consideration the values for the jpixels neighbors: 

(7) : 

x+2 y+2 

SADMap{x,y) = l+ ^ J^TmpSADMap(i,j) 
SADMap{x,y)^rrM.SADMapix,y)AQl) 

The method takes into consideration the texture of the current frame 
15 264 and the reference images 262, 266 and compares there between. The 
second pre-processing method is favorable since it is less sensitive to light 
changes 

At the price of increased computational cost, in order to achieve a 
more accurate model optionally higher moments could be calculated. 
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Calculating higher moments involves the performance of the following set of 
equations: 

(8): . • ' 

*+2 y+2 

^ MOO 

X+2 y^2 



x+2 y+2 



MOO 



5 It will be easily perceived that the method could be broadened for 

even higher moments. Several equations of the second pre-processing method 
represent a simtdation of a neural network. 

The pre-processing step produces several outputs that are used by the 
clustering step. Such outputs are the short-term and long-term SAD maps. 

10 Each pixel in the SAD maps is assigned a value in the range of 1 through ICQ. 
High values indicate great difference between the value of the pixel in the 
reference images and the value of the pixel in the current frame. The purpose of 
the clustering is to cluster the high difference pixels into objects. Referring now 
to Fig. lOA the clustering step 120 includes a two-stage Kalman filtering, two 

15 major processing sections, and an object status updating. In order to improve 
accuracy of the tracking and in order to reduce the computing load a Kalman 
filter is used to track the motion of the objects. The Kalman filter is performed 
in two steps. The prediction step 120 is performed before the adjustment of the 
objects and the update step 125 is performed after the creation of a new object 

20 The Kalman state of the object is updated in accordance with the adjusted 
parameters of the object At step 204 the status of the object is updated. The 
changing of the object- status from "dynamic" status to "static" status is 
performed as follows: If the value of the non-moving counter associated with 
the object exceeds a specific threshold then the status of the object is set to 
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"static". The dead-area (described in the clustering step) is calculated and 
saved. The pixels that are bounded within the object are copied from the 
background draft to the short-term reference image. Subsequently, the status of 
: the object is set the "static". Static objects are not adjusted until their status is 

5 changed back to "dynamic". 

Still referring to Fig. 1 OA in the processing step 122, in order to 
perform tracking of the objects that were detected in the previous video frames, 
the parameters of the existing objects are adjusted. In the processing step 124 
new objects are created from all the high value pixels that do no belong to the 

10 already created objects. The adjustment of the object parameters is done for 
every group of. objects. The objects are divided into groups in accordance to 
their location. Objects of a groiq> are close to each other and might occlude 
each other. Objects from different groups are distant from each other. The 
adjustment of groups of objects provides for the appropriate handling of 

15 occlusion situations. 

Referring now to pig. lOB at step 126 the objects groups are built 
An object-specific bounding ellipse represents each object The functionality, 
structure and operation of the ellipse will be described, herein after in 
association with the following drawings. Every two objects are identified as 

20 neighbors if the miniTrmm distance between their bounding ellipses is up to 
about 4 pixels. Using the neighborhood relations between every two objects, 
the object groups are built Note should be taken that static objects are not 
adjusted. At step 128 the parameters of the existing dynamic objects are 
adjusted in order to perform tracking of the objects detected in the previously 

25 captured video frames. The objects are divided into groups according to their 
locations. Objects of a group are close to each other and may occlude each . 
other. Objects belonging to different groups are distant from each other. The 
adjustment of the object parameters is performed for every group of objects 
separately. The adjustment to groups of objects enables appropriate handling of 

30 occlusion situations. At step 126 groups of objects are built Each object is 
represented by a bounding marker, which a distinct artificially generated 
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• . graphical structure, such as an ellipse. A pair of objects is identified as two 
neighboring members if the minimum'distance between their marker ellipses is 
up to a pre-defined numb^ of pixels. In the preferred embodiment of the 
invention Hie pre-defined nxmiber of pixels is 4 while in other embodiments 
5 different values could be used. Using the neighborhood relations between all 
the pairs of objects the object groups are built At step 128 the object groups 
are adjusted. The object group adjustment process determines the optimal 
spatial parameters of each object in the objects group. Each set of spatial 
parameter values of all the objects in a given objects group is scored. The 

10 purpose of the adjustment process is to find the spatial parameters of each 
object in a group, such that the total score of the group is maximized. The 
initial parameters are the values generated for the previously captured firame. 
The initial base score is derived from a predictive Kalman filter. In each 
adjustment iteration, a pre-defined number of geometric operations are 

15 performed on the objects. The operations ejffect changes m the parameters of 
every object in the group. Various geometric operations could be used, such as 
translation, scaling (zooming), rotation, and the Uke. In the preferred 
embodiment of the invention, the niunber of geometric operations applied to 
the object is 10 while in other preferred embodiments different values could be 

20 applied. In the preferred embodiment of the invention, the following 
geometrical operations with the respective values are used: a) Translation right 
on axis 1, b) Translation left on axis 1, c) Translation right on axis 2, d) 
Translation left of axis 2, e) Down-scaUng by shrinking axis 1, f) Up-scaling by 
blowing axis 1, g) Down-scaling by shrinking axis 2, h) Up-scaling by blowing 

25 axis 2, i) Rotation to the left through 5 degrees, and j) Rotation to the right 
through 5 degrees. The score of every change is measured and saved in a table. 
The structure and the constituent elements of the table are described via a 
representation of an exemplary table as follows: 



Adj 1 


Adj 2 


Adj 3 


Adj 4 


Adj 5 


Adj 6 


Adj 7 


Adj 8 


Adj 9 


Adj 
10 


100 


102 


101 


105 


104 


108 


110 


108 


100 


120 
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150 


80 


105 


104 


110 


112 


114 


121 


119 


120 


123. 


121 


112. 


114 


119. 


117 


109 - 


108 


105 


101 



' In the example above there are 3 objects in the group where each 
row represents an object The 10 adjustments performed on each object are 
represented by the results shown in each row. PCTfomiing adjustment 1 to the 

5 2"^ object yields the maximum score for the group. Thus, adjustment 1 will be 
applied to the parameters of the 2^^ object The score is weighted by the non- 
movement-time of the object As a result the algorithm tends not to perform 
changes on objects that were not in movement for a significant period. The 
iterative process is performed in order to improve the score of the group as 

10 ■ much as possible. The iterative process stops if at last one of the following 
conditions is satisfied: a) the highest score found in the iteration is no greater 
than the score at the beguming of the iteration, and b) at least twenty iterations 
have been completed. 

In order to reduce the computational load, every ellipse parameter is 

15 changed according to the movement thereof as derived by a Kalman filter used 
to track after the object If the score of the groiq) is higher than the base score 
the change is applied and the new score will become the base score. 

In order to handle occlusions a '"united objecf ' is built, which is a 
union of all the objects in the group. Thus, each pbcel that is associated with 

20 more than one object in the group, will contribute its score only once and not 
for every member object that wraps it The contribution of each pixel in the 
SAD map to the total score of the group is set in accordance with the value of 
the pixeL 



25 



(9): 



Contribution = 



+ 2 HighTH<vaI 

+ 1 LowTH < val < HighTH 

-1 val<LowIH 



Subsequent to the completion of the about 10 iterations, specific 
object parameters associated with each group object are tested against specific 
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thresholds in order to check whether the object should.be discarded. The object 
parameters to be tested are the TniniTmini object area, the minimum middle 
score, the TpnaYiTnntn dead area, and the overlap ratio. 

a) Maximum object area concerns a threshold Value limiting the 
5 TniTiiTniiTn permissible spatial extent for an object If the maximum object area 

is smaller than the value of a pre-defined threshold then the object is discarded. 
So for example random appearance of non-real objects or dirty lenses providing- 
random dark pixels are cleaned effectively. 

b) Miinimum middle score relates to the score of a circle that is 
10 bounded in the ellipse representing the object If the score of the circle is below 

a pre-defined value of an associated threshold then the object is eliminated. A 
low-score circle indicates a situation where two objects were in close proximity 
in the scene and thus represented on object (one ellipse) and then they 
separated. Thus, the middle of the ellipse will have a lower score than the rest 
15 of the area of the object. 

c) Maximum dead area concerns an object that includes a large 
number of low value pixels. If the nimiber of such pixels is higher than an 
associated threshold value then the object is discarded. 

d) Overlap ratio regards occlusion situations. Occlusion is supported 
20 up to about 3 levels. If most of the object is occluded by about 3 other objects 

for a period of about 10 seconds, the object is a candidate to be discarded. If. 
there is more than one object in that group that should be eUminated then the 
most recently moving object is discarded. 

Subsequent to the completion of the parameters testing procedure 

25 the non-discarded objects are cleared firom the SAD map by setting the value of 
the set of pixels bounded in the object eUipse to zero. The discarded objects are 
saved in the discarded objects archive to be utilized as object history. The data 
of every new object will be compared against the data of the recently discarded 
objects stored in the archive in order to provide the option of restoring the 

30 object firom the archive. 
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Refemiig now to Fig. IOC consequent to the adjustment of the 
existing objects, the pixels in the SAD map are provided with values in the. 
range of 0 through 100. A yahie of zero means that the pixel belongs to an 
existing object Tlie drawing shows tiie steps in the creation of new objects. 
5 The construction of a new object is based a pixel havmg a high value in the 
SAD map* The procedure starts by searching for a free entry in the objects table 
74. of Fig. 5 in order to enable the storage of the parameters of a new object 
(not shown). The high value pixel is assumed to be the center of the object In 
order to derive the boundary of the new object a specific boundary locater 

10 function, referred to herein after as the "'spider function" is activated at step 
130. The spider function includes a set of program instructions associated, with 
a control data structure. The control data structure contains location and size 
data that define the spatial parameters of a spider-like graphical structure. The 
spider-like structure is provided with about 16 extensible members (arms) 

15 uniformly divided across 360 degrees. The extensible members of the spider- 
like structure are coxmected to the perceived center of the new object and 
dynanodcally radiate outward. The length of each extensible member is 
successively increased until the far end of spatially each member is aligned 
with a pixel having a high value in the SAD map. In order to handle snoiall gaps 

20 in the object "bridging" line segments of up to 4 pixels are allowed. Thus, if 
there are more than 4 continuous low value pixels in the direction of the 
radiation, the extension of a member wiU be discontinued. The member- 
specific final coordinates are saved in X, Y arrays in the control data structure, 
respectively, in. order to indicate the suitable boundary points constituting the 

25 boimdary line of the new object Next, in order to improve accuracy the central 
point of the spider structure is re-calculated fi-om the X, Y arrays, as follows: 
(10): 

Then, subsequent to the re-location the central point of the structure at 
30 the Yc, Xc pixel coordinates the spider structure is re-built Extending the 
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about 16 extensible members of the spider structure yields two Y[16] and *. 
X[16] arrays. If the spatial extent of the spider structure is sufiBcient the 
parameters of the boundary ellipse are calculated If the spatial extent of the 
spider overlaps the area of an existing object the new object will not be created 
5 unless its size is above a minimum threshold. 

Still referring to Fig. IOC at step 132 the spider-like graphical structure 
is converted to an ellipse-shaped graphical structure. An ellipse is provided 
with 5 parameters calculated from the X, Y arrays as follows: 

10 (11): 

kmO k=0 *-0 

The covariance matrix of the ellipse is: 



15 (12): 

C = 



The ellipse covariance matrix is scaled to wrap the geometric 
average distance. The covariance matrix is multiplied by where F is calculated 
in the following manner 
20 (13): 



k = 0.A5 



At step 134 the new object is adjusted via the utilization of the same 
adjustment procedure used for adjusting existing objects. The discarded objects 
archive includes recentiy discarded objects. If the spatial parameters, such as 
25 location and size, of a recently discarded object are similar to the parameters of 
the new object, the discarded object is retrieved from the archive and the 
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tracking thereof is re-roitiated If no similar object is foimd m the archive then 
the new object will get a new object ID, and the new object's data and meta 
data will be inserted into the objects table. . Subsequently tracking of the new 
object will be initiated 
5 Referring now to Fig. 11 the output of the clustering step is the 

updated spatial parameters of the object stored in the object table. The scene 
characterization layer 208 uses the existing objects to describe the scene. The 
layer 208 includes program sections tiiat analyze the changes in the spatial , 
parameters of the object, characterize the spatio-temporal behavior pattern of 
10 the object, and iq>date the properties of the object The temporal parameters 
and the properties of the object are suitably stored in the objects table. At step 
210 object movement is measured The measurement of the object is performed 
as follows: 

(14) : • . 

dV^ = Sff[i{MeanX - Pr evMeanX) dVy = sga{MeanY - Pr evMeanT) 

15 AccMoveX = 0.5 • AccMoveX + 0.5 • dV^ AccMoveY = 0.5 • AccMoveY + 0.5 • dVy 

AccDist — yl AccMoveX^ + AccMoveY^ 

MoahX/Y is the location of the center of the object in the current 
frame. PrevMeanXAT is the location of the center of the object in the previous 
frame. The value of non-moving.counter is updated in aqcordance with AccDist 
20 as follows: 

(15) : 

^ {0.95 NonMoveCnt AccDist > OA 
NonMoveCnt = 1 , ^ - 

\NonMoveCnt -¥1 otherwise 

In ttie unattended luggage application there is a possibility that a 
25 standing or sitting person that does not make significant movements will 
generate an alarm. In Order to handle such false alarms, the algorithm checks 
whether there is motion inside the object ellipse. If in at least 12 of the last 16 
frames there was motion in the object, it is considered as a moving object 
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Consequently, the value of the non-moving counter is divided by 2. At step 212 
an object merging mechanism is activated. There are cases in which an element 
in tiie monitored scene, such as a person or a car, is. represented by 2 objects 
whose ellipses are partially overlapping due to clustering errors. The object 

5 merging mechanism is provided for the handling of the situation. Thus, for 
example, if at least 2 objects are close enough to each other j ("close'* as defined 
in for the clustering process) and are moving with the same velocity for more 
than 8 frames then the two objects are considered as representiag the same 
element Thus, .the objects will be merged into a single object and a new ellipse 

10 will be created to bound the merged objects. The new ellipse data is saved as 
the spatial parameters of fhe older object whUe the younger object is discarded. 
Each merge is performed between 2 objects at a time. If there are more than 2 
overlapping objects that move together additional merges will be performed. 
Following the characterization of each object's spatio-temporal behavior 

15 pattern and other properties, such as texture (including but not limited to color), 
shape, velocity, trajectory, and the like, against the pre-defined behavior 
patterns and properties of "suspicious" objects, at step 214 the objects whose 
behavior pattem and properties are similar to the "suspicious" behavior and 
properties will generate an alarm trigger. Note should be taken that the 

20 suspicious behavior patterns and suspicious properties could vary among 
diverse applications. 

Referring now to Fig. 12 the background update layer updates the 
reference images for the next fi*ame calculation. The method uses two reference 
images: a) the long-term reference image, and b) the short-term reference 

25 image. The long-term reference image describes the monitored scene as a 
background image without any objects. The short-term reference image 
includes both the background image and static objects. Static objects are 
defined as objects that do not belong to the background, and are non-moving in 
the monitored scene for a pre-defined period. In the preferred embodiment of 

30 the invention the pre-defined period is defined as having a length of about 1 to 
2 minutes. In other embodiments, different time imit values could be used. The 
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background updating process uses the oulputs of all the previous layers to 
generate a new short-term reference image. Each pixel that satisfies the 
. following conditions is updated: a) similar enough to the short-term reference 
image (according to the score given in the pre-processing step), and b) not 

5 ' included in an object Pixels that do not satisfy the first condition but satisfy the 
second condition for a long sequence of frames get updated as well* For every 
fixed number of frames, a comparison is made between the current reference 
images to the previous reference images, in order to check if the changes made 
to the reference images were correct The long-term reference image is iq)dated 

10 from the short-term reference image in all pixels that are not contained in any 
of the tracked objects. An object may change its status from dynamic to static if 
it is not moving for a given period It can change its status from static to 
dynamic if the score thereof in the long-term reference image significantly 
decreases. The background maintenance could be augmented by user-initiated 

15 updates. Thus, the user can add several objects to the backgroxmd in order to 
help the system overcome changes in the backgroimd due to changes in the 
location of a background object. For example a "bench" object that was 
dragged into the scene will be identified by the method as an object The user 
can classify the object as a neutral object and therefore can add the object to the 

20 background in order to prevent the identification thereof as a dynamic or a 
static object 

Still referring to Fig. 12 at step 198 the background draft frame is 
updated The background draft frame is continuously updated from the current 
frame in all macro-blocks (16 X 16 pixels or the like) in which there was mo 
25 motion for several frames. Each pixel in the background draft is updated by 
utilizing the following calculation: 
(16):. 

Background Draft (x, y) = Background Draft (x, y). + sgn (Current 
Frame (x, y) — Background Draft (x, y) 
30 When an object is identified as a static object, it is assumed that the 

identified object already appears in the background draft Thus, the pixels of 
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the object are copied from the background draft to the short-term reference 
image. The short-term reference image is updated at step 200. The update of 
each pixel in short-term reference image is performed in accordance with the 
values of the pixel in the SAD map and in the objects mapi In the update 
S calculations the following variables are used: 

SAD (x, y) = the SAD map value in the x, y pixel location 
OBJECT(x, y ) = the number of objects that the pixel in the x, y location 
belongs to 

BACKGROUND_.COUNTER (x, y) 
10 NOT_BACKGROUND_COUNTER(x,y) 

The previously defined counters are updated by performing the following 
sequCTce of instructions: 

If (SAD (x, y) < 50) and (OBJECT (x, y),= 0) then the according to the 
SAD map the pixel belongs to the background and does not belong to any 
15 object. Therefore, the value of the BACKGROUND^COUNTER (x, y) is 
incremented by one. If SAD (x, y) >50 and (OBJECT (x, y) = 0) then the pixel 
does not belong to the background and does not belong to any object. 
Therefore, the value of the NOT_BACKGROU]SID_COUNTBR is incremented 
by one. If OBJECT (x, y) not equal to 0 then thwe is at least one object that the 
20 pkel belongs to. Thus both counters are set to zero. Consequent to the updating 
of the counters the pixels are updated in accordance with the counters. If 
BACKGROUND^COUNTER (x, y) greater than or equal to 15 then the pixel 
at the X, y coordinates is updated and the covmter is set to zero. If 
NOT_BACKGROUND_COUNTER (x, y) greater than or equal to 1000 then 
25 the pixel at the x, y coordinates is updated and coimter se to zero. 

At step 202 the long-term reference image is updated by copying all the 
pixels that are not bounded by any object's ellipse firom the short-term 
reference image to the long-term reference image. 

In the short-t^m reference image the score of each static object is 
30 measured The score are compared to the score obtained when the object 
became static. If the current score is significantly lower than the previous score 
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it is assumed that the static object has started moving. The status of the object 
is set to "dynamic" and the pbcels of the object are copied from the long-temi 
reference image to the short-tenn reference image. Thus, die object will be 
.adjusted for the next frame during the adjustmrait process. 

5 The applications that could utilize the system and method of object 

tracking will now be readily apparent to person skilled in the art. Such can 
include crowd control, people counting, an ofiline and online investigation 
tools based on. the events stored in the database, assisting in locating lost 
luggage (lost prevention) and restricting access of persons or vehicles to certain 

10 zones, unattended luggage detection, "suspicious" behavior of persons or other 
objects and the like. The ^plications are botii for city centers, airports, secure 
locations, hospitals, warehouses, border and other restricted areas or locations 
. and Ihe like. 

15 It will be appreciated by persons skilled in the art that the present 

invention is not limited to what has been particularly shown and described 
hereinabove. Rather the scope of the present invention is defined only by the 
claims, which follow. 
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CLAIMS 

WHAT IS CLAIMED IS: 

L An apparatus for the analysis of a sequence of captured images 
covering a scene for detecting and tracking of moving and static 
5 objects and for matching the patterns of object behavior in the 

captured images to object behavior in predetermined scenarios, the 
apparatus comprising the elements of: 

at least one image sequence som*ce for transmitting a sequence of 
images to an object tracking program; and 
10 an object tracking program comprising; 

a pre-processing application layer for constructing a difference 
. image between a currently captured video £rame and a previously at 
least one constructed reference image showing the background; 
an objects clustering application layer for generating at least one 
15 new or updated object from the difference image; and 

a background updating application layer for updating at least one 
reference image prior to processing of a new frame. 

2. The apparatus of claim 1 wherein the object tracking program fiarther 
comprises a configuration application layer for initializing the 

20 apparatus in accordance with user pre-defined parameters. 

3. The apparatus as claimed in claim 2 wherein the configuration 
application layer comprises a reference image constructor, the 
reference image constructor comprising a current frame capture 
module for assigning a captured image as the reference image. 

25 4. The apparatus as claimed in claim 2 wherein the configuration 

application layer comprises a reference image constructor, the 
reference image constructor comprising a reference image loading 
module for loading an existing reference image located on file as the 
reference image. 

30 5. The apparatus as claimed in claim 2 wherein the configuration 

application layer comprises a reference image constructor, the 
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reference image construcstor comprising a reference iniage learning 
module for generating a reference image from a consecutive sequence 
of captured images. 

6. The apparatus as claimed in claim 2 wherein fhe configuration 
5 application layer comprises a timing parameters define for providing . 

time settiag information. 

7. The apparatus as claimed in claim 2 wherein tihe configuration 
application layer comprises the element of a visual parameters definer, 
the visual parameters definer for providing the geometry of the scene. 

10 8. The apparatus as claimed in claim 7 wherein the visual parameters 

definer comprises a camera tilt setting module for deriving camera tilt 
in accordance with measurements of an object located at different 
locations in the scene. 

9. The apparatus as claimed in claim 7 wherein the visual parameters 
15 definer comprises a camera zoom setting module for defining the 

maximum, the minimum and the typical size of the objects to be 
tracked. 

10. The apparatus as claimed in claim 7 wherein the visual parameters 
definer comprises a region location definition ipodule for defining the 

20 location of at least one region-of-interest within the scene. 

11. The apparatus as claimed in claim 7 wherein the visual parameters 
definer comprises a region type definition module for defining a 
region pf interest in the scene. 

12. The apparatus as claimed in claim 7 wherein the visual parameters 
25 definer comprises an alarm type definition module for defining a 

region of interest as a trigger alarm region. 

13. The apparatus as claimed in claim 1 wherein the pre-processing 
application layer comprises: 

a current frame handler for obtaining a captured fi-ame; 
30 a short term reference image handler for loading an existing short- 

term reference image; 
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a long term reference image handler loads an existing long-term 
reference image; 

a' pre-processor module for generating a new short term and long 
term reference images; 
5 a short term difference image handler for iqpdating the short term 

reference image with the new short term reference image; and 

a long term reference image handler for i^dating the long term 
reference image with the new long term reference image. 

14. The apparatus of claim 13 wherein the short and long term reference 
10 image handlers fiirther pro vide the moments of the short and long term 

referwce images. 

15. The apparatus as claimed in claim 1 wherein the clust^ing application 
layer comprises: 

an object merger module for correcting clustering errors by 
15 successive merging of at least two partially overlapping objects having 

the same motion vector for a pre-defined period of time; 

an objects group builder module for creating at least one group of at 
least two close objects; 
an object group adjuster module for determining the spatial 
20 parameters of each object in the at least one group; and 

a new objects constructor module for constmcting a new object 
based on Ihe difference image. 

16. The apparatus as claimed in claim 15 whierein the clustering 
application layer further comprises an object searcher module for 

25 locating discarded objects having spatial parameters similar to the 

parameters of the new object. 

17. The apparatus as claimed in claim 15 wherein the clustering 
application layer further comprises a Kalman filtering module; 

18. The apparatus as claimed in claim 15 wherein the clustering 
30 application layer further comprises an object status updater module for 

modifying the status of an object 
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- 19. The apparaixis as claimed in claim 1 wherein the objects clustering 
appUcation layer generates at least one new or iipdateki object from 
difference image and an at least one existing object 

20. The apparatus of claim 1 wherein the object tracking program further 
S comprises a scene characterization application layer for describing the 

scene and for triggering an alarm, based on comparing a behavior 
pattem of the at least one existing object to the at least one pre-defined 
behavior pattem or characteristic. 

21. The apparatus as claimed in claim 20 wherein the ' scene 
10 characterization application layer comprises an object movement 

measurement module for analyzing changes in the parameters of the at 
least one existing object and determining the at least one existing 
object movement 

22. The apparatus as claimed in claim 20 wherein the scene 
15 characterization application layer comprises an object merger module 

for correcting errors the at least one existing object and an alarm 
triggering mechanism for determining whether an alarm is to be 
triggered based on the at least one exisfting obj ect patterns. 

23. The apparatus claimed in claim 1 wherein the background update 
20 application layer comprises a background draft iq)dater module for 

updating the at least one reference image from the cmxentiy captured 
video frame. 

24. The apparatus claimed in claim 23 wherein the background update 
application layer ftuiher comprises a short term reference image. 

25 updater module and a long term reference image updater module for 

maintaining the updated short term and long term reference images. * 

25. The apparatus claimed in claim 1 fiirtiier con:q>rising an object 
tracking control database, the database comprising; 

at least one long term reference image, the at least one long term 
30 reference image comprising a background image of the scene without 

dynamic or static objects tracked by the apparatus; 
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a short term reference image, the at least one short term reference 
image coiDpfising a background image of the scene with the dynamic 
or static objects tracked by the apparatus. 

26. The ^paratus claimed in clairni 25 wherein the object tracking control 
5 database further comprising; ' 

an objects table comprising a list of the dynamic or static objects 
tracked by the apparatus, each object is associated with object data and 
object meta data; and 
a distance short term map and a distance long term map showing the 
10 short-term and long-term reference images; and 

a background draft comprisiag a changing image of the scene and 
making ixp the reference image. 

27. The apparatus claimed in claim 26 wherein the object tracking control 
. database further comprising a discarded objects archive for storing 

15 discarded objects. 

28. A method for the analysis of a sequence of captured images showing 
a scene for detecting and tracking of at least one moving or static 
object and for matching the patterns of the at least one object behavior 

20 in the captured images to object behavior in predetermined scenarios, 

the method comprising the step of: 
capturing at least one image of the scene; 

pre-processing the captured at least one image and generating a short 
term difference image and a long term difference image; 
25 clustering the at least one moving or static object in the short term 

difference and long term difference images and generating at least one 
new object and at least one existing object. 

29. The method as claimed in claim 28 further comprising the steps of 
characterizing the visual scene and updating the background reference 

30 image by updating the short term reference frame and the long term 

reference frame. 
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30. The method as claimed in claim 28 further comprising the step of 
configuring the object tracking program for providing at least one 
reference image, at least one timing parameter and at least one visual 
parameter. 

S 31. The method as claimed in claim 28 further comprising the step of 

configuring the object tracking program for setting at least one region 
ofiaterest 

32. The method as claimed in claim 28 further comprising the step of 
configuring the object tracking program, said step comprises the steps 

10 • of: . 

constructing an initial short term reference image and an initial long 
term reference image; 

providing the object tracking program with the initial short term 
reference image and the initial long term reference image; 
15 providing timing parameters; and assigning visual parameters. 

33. The method as claimed in claim 32 wherein the step of constmcting 
comprises creating the short term reference image and the long term 
reference image from a captured image. 

34. The method as claimed ia claim 32 wherein the step of constructing 
20 comprises creating the short term reference image and the long term 

reference image from internally stored images. 

35. The method as claimed in claim 32 wherein the step of constmcting 
comprises creating the short term reference mmgo and the long term 
reference image through a learning process utilizing a set of 

25 sequentially ordered and captured images. 

36. The method as claimed in claim 28 wherein the step pf pre-processing 
comprises the steps of: 

obtaining the short term reference image; 
obtaining the long term refer^ce image; • 
30 obtaining a currently captured image; 
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generating a short term difference image firom the short term 
referCTice frame and the cun^tly capture 

generating a long term diffidence image from the long term 
reference frame and the currCTily captured image. 
5 37. The method as claimed in claim 28 wherein the step of clustering 

comprises the steps of: 

building groups of clustered objects from at least two dynamic or 
static objects in accordance with the relative locations of each of the at 
least two dynanuc or static objects; 
10 . adjusting the parameters of each of the at least two dynamic or static 

objects clust^ed within each group; 

updating the parameters and status of each of the at least two 
dynamic or static objects. 

38. The method of claim 28 wherein the step of clustering comprises the 
15 steps of predicting the motion of the at least one moving object by 

predictive filtering and adapting the parameters of the at least one 
moving object 

39. The method as claimed in claim 37 wherein the step of building 
groups of clustered objects comprises the steps of: 

20 measuring the distance between each of the at least two dynamic or 

static objects; 

determining neighborhood relations between each of the at least two 
dynamic or static objects and in accordance with the results of the 
distance measurement; 
25 clustering the at least two dynamic or static objects in accordance 

with the determined neighborhood relations into distinct object, 
groups; and 

adjusting the distinct object groups in order to determine the optimal 
spatial parameters of each of the at least two dynamic or static objects 
30 in the distinct object groups. 
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40. The method as claimed in claim 38 wherein- the step of adapting the 
. parameters of the at least one moving object comprises/the steps of 

locating the center of the at least one moving object; locating the 
' boundary points constituting the boundary line of the at least one 
5 moving object; re-calculating the location of Ihe center of the at least 

one moving object; and inserting the at least one moving object into an 
objects table. 

41. The method as claimed in claim 40 further comprising the steps of 
adjusting the spatial parameters of the at least one moving object and 

10 retrieving similar obj^ts to the at least one moving object from a 

discarded object archive. 

42. The method as claimed in claim 29 wherein the step of characterizing 
comprises the steps of: measuring the movement of the at leiast one 
moving object to determine the behavior of the at least one moving 

15 object; ; merging spatially overlapping objects; generate^ an alarm 

trigger in accordance with the results of the behavior of the at least 
one moving object or iu accordance with the spatial or visual 
parameters of the at least one moving object. 

43. The method as claimed in claim 42 wherein the alarm trigger is 
20 generated in accordance with the texture of ihe object 

44. The method as claimed in claim 42 wherein the alarm trigger is 
generated in accordance with the shape of the object, 

45. The method as claimed in claim 42 wherein the alarm trigger is 
generated in accordance with the velocity of the at least one moving 

25 object 

46. The method as claimed in claim 42 wherein the alarm trigger is 
generated in accordance with the trajectory of the at least one moving 
object 

47. The method as claimed in claim 28 wherein the step of updating the 
30 background comprises the steps of: updating the backgroxmd draft; 
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updating the short term reference image; and updating the long term 
reference image. . 
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